PHP RFC: Typed Properties 2.0
- Date: 2018-06-15
- Author: Bob Weinand bwoebi@php.net, Nikita Popov nikic@php.net
- Based on previous RFC by: Joe Watkins krakjoe@php.net, Phil Sturgeon philstu@php.net
- Proposed PHP version: PHP 7.4
- Implementation: https://github.com/php/php-src/pull/3313
- Status: Implemented (in PHP 7.4)
Introduction
With the introduction of scalar types and return types, PHP 7 greatly increased the power of PHP's type system. However, it is currently not possible to declare types for class properties, forcing developers to instead use getter and setter methods to enforce type contracts. This requires unnecessary boilerplate, makes usage less ergonomic and hurts performance. This RFC resolves this issue by introducing support for first-class property type declarations.
Under this RFC, code like
class User { /** @var int $id */ private $id; /** @var string $name */ private $name; public function __construct(int $id, string $name) { $this->id = $id; $this->name = $name; } public function getId(): int { return $this->id; } public function setId(int $id): void { $this->id = $id; } public function getName(): string { return $this->name; } public function setName(string $name): void { $this->name = $name; } }
might be written as
class User { public int $id; public string $name; public function __construct(int $id, string $name) { $this->id = $id; $this->name = $name; } }
without sacrificing any type-safety.
Main differences to previous proposal
Typed properties have been proposed previously and were declined at the time. The new proposal comes with two major differences, which we believe address most of the concerns regarding the previous proposal:
- Types on static properties are supported. Support for static property types was not included in the previous RFC due to implementation issues, creating an inconsistency in the language. Under the new proposal type declarations may be added to static properties, with the same semantics as for normal properties.
- References to typed properties are supported. The previous proposal did not permit references to typed properties, due to the difficulty of enforcing the type if the property is modified indirectly through the reference. The new proposal allows taking references to typed properties and will enforce the declared type even if the modification happens through a reference.
Proposal
This RFC adds support for runtime-enforced type annotations for declared properties. The following example illustrates the basic syntax:
class Example { // All types with the exception of "void" and "callable" are supported public int $scalarType; protected ClassName $classType; private ?ClassName $nullableClassType; // Types are also legal on static properties public static iterable $staticProp; // Types can also be used with the "var" notation var bool $flag; // Typed properties may have default values (more below) public string $str = "foo"; public ?string $nullableStr = null; // The type applies to all properties in one declaration public float $x, $y; // equivalent to: public float $x; public float $y; }
For a discussion of the syntax choice, see the Alternatives section.
The fundamental invariant that is maintained by property type declaration, is that a property read will always either return a value that satisfies the declared type, or throw. While this sounds straightforward, the idiosyncrasies of the PHP language make enforcing this invariant non-trivial.
In the following, the semantics of property type declarations are laid out in detail.
Supported Types
Property type declarations support all type declarations supported by PHP, with the exception of void
and callable
.
The void
type is not supported, because it is not useful and has unclear semantics. Under a strict interpretation, properties of type void
could be neither read from nor written to, as there is no way to construct a value of type void
in PHP. Under a looser interpretation (consistent with the fact that we allow using the return values of void functions) a property of type void
could only hold the value null
. As both variants do not appear to be useful, we do not allow void
properties. This is consistent with parameter type annotations.
The callable
type is not supported, because its behavior is context dependent. The following example illustrates the issue:
class Test { public callable $cb; public function __construct() { // $this->cb is callable here $this->cb = [$this, 'method']; } private function method() {} } $obj = new Test; // $obj->cb is NOT callable here ($obj->cb)();
This means that it is possible to write a legal value to a property and then proceed to read an illegal value from the same property. This fundamental problem of the callable
pseudo-type is laid out in much more detail in the consistent callables RFC.
The recommended workaround is to instead use the Closure
type, in conjunction with Closure::fromCallable()
. This ensures that the callable will remain callable independent of scope. For a discussion of alternative ways to handle the callable
issue, see the Alternatives section.
The following list contains all types supported at the time of this writing:
bool, int, float, string, array, object iterable self, parent any class or interface name ?type // where "type" may be any of the above
The parent
type may be used in classes that do not have a parent, consistent with parameter and return type declarations.
Strict and Coercive Typing Modes
Just like parameter and return type declarations, property types are affected by the strict_types
directive. If strict_types=1
at the location of a property write, then the assigned value must satisfy the declared type exactly, with the usual exception of implicit int to float casts. If strict_types=0
at the location of the property write, then the usual rules for coercive type checks are followed. In both cases, the final value stored inside the property will always satisfy the declared type.
As the following example illustrates, only the strict_types
mode at the write-site of the property is relevant. The strict_types
mode at the declaration-site of the property has no impact on behavior.
// file1.php declare(strict_types=1); class Test { public int $val; } $test = new Test; $test->val = "42"; // Throws TypeError // file2.php declare(strict_types=0); $test = new Test; $test->val = "42"; var_dump($test->val); // int(42)
Consistent with the handling of parameter and return types, code inside internal functions is always considered to be in coercive mode:
declare(strict_types=1); class Test { public int $val; } $test = new Test; $rp = new ReflectionProperty(Test::class, 'val'); $rp->setValue($test, "42"); // Property set by internal code var_dump($test->val); // int(42)
The property assignment inside ReflectionProperty::setValue()
occurs inside internal code, which uses coercive mode. As such, the assignment is permitted, even though the code invoking ReflectionProperty::setValue()
uses strict typing.
If a value has to be coerced by an assignment to a typed property, then the return value of the assignment is the coerced value, rather than the original one:
class Test { public int $val; } $test = new Test; var_dump($test->val = "42"); // int(42) var_dump($test->val); // int(42)
This is consistent with PHP's semantics of returning the “actually assigned” value for assignment expressions. For example $str[$i] = 1
has return value string(1) “1”
rather than int(1)
. Similarly, assign-modify operations like +=
also always return the result of the operation. Additionally, in the case of by-reference assignments, the return value is a reference to the assigned storage location, which necessarily holds the coerced value.
Inheritance and Variance
Property types are invariant. This means that the type of a (non-private) property is not allowed to change during inheritance (this includes adding or removing property types). If the parent property is private, then the type may be changed arbitrarily.
class A { private bool $a; public int $b; public ?int $c; } class B extends A { public string $a; // legal, because A::$a is private public ?int $b; // ILLEGAL public int $c; // ILLEGAL }
The reason why property types are invariant is that they can be both read from and written to. The change from int
to ?int
implies that reads from the property may now also return null
in addition to integers. The change from ?int
to int
implies that it is no longer possible to write null
to the property. As such, neither contravariance nor covariance are applicable to property types.
In the future, should an additional modifier such as readonly
be introduced, it may be possible to relax this restriction for such properties, depending of the exact semantics of the modifier.
Invariance also applies to static properties. The case is less clear cut here, because static properties are usually accessed through an explicit class name, so that A::$prop
and B::$prop
could have different types without violating the Liskov substitution principle (LSP). However, PHP also supports late static binding (LSB), in which case static::$prop
could refer to either A::$prop
and B::$prop
and a type change violates LSP. Similarly, access through an object $obj::$prop
is possible. Combined with the fact that we also enforce inheritance checks for static methods (where the situation is essentially the same), we opt to treat static and non-static properties consistently here.
It is worth noting that the self
and parent
types are (as always) resolved relative to the class they are declared in, or, in the case of traits, the class they were imported into. As such, the following code is illegal:
class A { public self $prop; } class B extends A { public self $prop; }
While textually the property types are the same, the resolved property types for both classes would be A
and B
respectively, which differ.
If two different type declarations to aliased classes are used, they are considered equal, if the alias is known at the time of inheritance checking. Depending on the usual early-binding rules, this may either be at compile-time or at run-time. The following code is legal:
// file1.php class Foo {} class_alias('Foo', 'Bar'); // file2.php class A { public Foo $prop; } // file3.php class B extends A { public Bar $prop; }
This is subject to the usual limitations affecting aliases during inheritance checks, such as Bug #76451.
When two traits imported in the same class define the same property, then their property types must match, similar to the existing requirement that the default value must be the same. As such, the following code is invalid:
trait T1 { public int $prop; } trait T2 { public string $prop; } class C { use T1, T2; }
Default Values
Default values for typed properties have to match the type of the property. The only exception is that float
properties also accept integer default values, consistent with the handling for parameter types.
Typed properties cannot have a null
default value, unless the type is explicitly nullable (?Type
). This is in contrast to parameter types, where a null
default value automatically implies a nullable type. We consider this to be a legacy behavior, which we do not wish to support for newly introduced syntax.
The following code illustrates legal and illegal default values:
class Test { // Legal default values public bool $a = true; public int $b = 42; public float $c = 42.42; public float $d = 42; // Special exemption public string $e = "str"; public array $f = [1, 2, 3]; public iterable $g = [1, 2, 3]; public ?int $h = null; public ?object $i = null; public ?Test $j = null; // These have *no* legal default values public object $k; public Test $l; // ILLEGAL default values public bool $m = 1; public int $n = null; public Test $o = null; }
If the default value is a non compile-time evaluable initializer expression, the default value is not checked at compile-time. Instead it will be checked during constant-updating, which most commonly will occur when an object of the class is instantiated. As such, the following code is legal:
class Test { public int $prop = FOO; } define('FOO', 42); new Test;
If the constant held an illegal type, a TypeError
exception would be generated during the new Test
instantiation.
Uninitialized and Unset Properties
If a typed property does not have a default value, no implicit null
default value is implied (even if the property is nullable). Instead, the property is considered to be uninitialized. Reads from uninitialized properties will generate a TypeError
(unless __get()
is defined, see next section).
class Test { public int $val; public function __construct(int $val) { $this->var = $val; // Ooops, typo } } $test = new Test(42); var_dump($test->val); // TypeError
Uninitialized typed properties are indicated in var_dump
output as follows:
object(Test)#1 (0) { ["val"]=> uninitialized(int) }
This behavior ensures that uses of uninitialized properties can be caught quickly, without enforcing overly strict initialization requirements, such as requiring all properties to be initialized in the constructor. For a discussion of other approaches to handle initialization, see the Alternatives section.
If a typed property is unset()
, then it returns to the uninitialized state. While we would love to remove support for the unsetting of properties, this functionality is currently used for lazy initialization by Doctrine, in combination with the functionality described in the following section.
Overloaded Properties
If a typed property is in uninitialized state, either because it has not yet been initialized, or because it has been explicitly unset()
, then reads from this property will invoke the __get()
method if it exists, consistently with the behavior of ordinary properties.
This allows for the following lazy initialization pattern:
class Test { public $untyped; public int $typed; public function __construct() { unset($this->untyped); unset($this->typed); // Not strictly necessary, uninitialized by default } public function __get($name) { if ($name === 'untyped') { return $this->untyped = $this->computeValue1(); } else if ($name === 'typed') { return $this->typed = $this->computeValue2(); } else { throw new Exception("Unknown property \"$name\""); } } } $test = new Test; var_dump($test->typed); // This calls __get() var_dump($test->typed); // This doesn't, as property now initialized
In this case, the return value of __get()
for a typed property must still satisfy the declared type of the property. As such, the following code produces a TypeError:
class Test { public int $val; public function __get($name) { return "not an int"; } } $test = new Test; var_dump($test->val); // TypeError
Not verifying the return value of __get()
would allow the integer property $val
to return a string value on read, which violates our fundamental invariant.
When checking the return type of __get()
for a typed property, the strictness mode at the point of the __get()
declaration is relevant, not the point where the property is read:
// file1.php declare(strict_types=1); class Test { public int $val; public function __get($name) { return "42"; } } // file2.php declare(strict_types=0); $test = new Test; var_dump($test->val); // TypeError, as __get() is in strict mode code
This behavior is used, because it is the __get()
method that determines the (apparent) value of the property, not the read-site.
Indirect Modifications
Properties can be indirectly modified in a number of ways (not counting the use of references, which are discussed in the next section). Such indirect modifications are also subject to property type checks. For example:
class Test { public int $x; } $test = new Test; $test->x = PHP_INT_MAX; $test->x++; // TypeError
The $test->x++
line is roughly equivalent to $this->x = $this->x + 1
and the final assignment is type-checked as usual. Note that the above code will error also in coercive mode, because overflowing coercions from floats to integers are prohibited.
References
Unlike the previous typed properties RFC, this RFC allows acquiring references to typed properties. While references in PHP are nowadays considered to be something of an antipattern and avoidance of their use is advisable, there are still many instances where their use cannot easily be avoided. For example, a blanket prohibition of references to typed properties would not allow the following code:
class Test { public array $ary = [3, 2, 1]; } $test = new Test; sort($test->ary);
The sort()
function accepts the parameter by reference and modifies the array in place. Without support for references, one would be forced to write the following instead:
$test = new Test; $ary = $test->ary; sort($ary); $test->ary = $ary;
While support for references may be desirable, it also poses a significant implementation challenge. Without special support a modification of a reference could change the type of a property, resulting in a violation of the type contract.
This RFC resolves this issue by tracking which typed properties are part of a reference, and enforcing their declared types whenever the reference is assigned to.
class Test { public int $x = 42; } $test = new Test; $x =& $test->x; $x = "foobar"; // TypeError
As the semantics of references in PHP are somewhat involved, the following subsections will describe some special cases. The Alternatives section discusses different approaches to reference handling.
References in PHP
There are some common misunderstandings about how references in PHP work, which need to be clarified before we can discuss the interaction with typed properties.
Firstly, references in PHP are undirectional. There is no concept of a “reference to”. References work by providing a shared storage location to which other storage locations may point. We will use the term “reference set” to refer to all storage locations which point to the same reference.
$a = 1; $b =& $a; $c =& $b;
In the above code, $a
, $b
and $c
are part of one reference set. An assignment to any variable inside the reference set will affect the value of all the variables.
A reference set may contain only a single variable:
$a = 1; $b =& $a; unset($b);
After the unset()
operation, the reference set only contains $a
. PHP semantics follow the general principle, that singleton reference sets should have no impact on behavior. Based on this, the PHP implementation may replace a singleton reference set with the contained variable (“unref operation”) during certain operations, which are not well-specified. In practice, singleton reference sets do exhibit some behavioral differences in some edge-cases.
General Semantics
Intuitively, assigning a value to a reference is equivalent to assigning it to all storage locations that are part of the reference set. The addition of typed properties preserves this intuition as follows:
If typed properties are part of the reference set, then the value is checked against each property type. If a type check fails, a TypeError
is generated and the value of the reference remains unchanged.
There is one additional caveat: If a type check requires a coercion of the assigned value, it may happen that all type checks succeed, but result in different coerced values. As a reference can only have a single value, this situation also leads to a TypeError
. (An example of this edge-case is shown later.)
In the following, some special cases and other behavioral details are discussed. It should also be noted that there are a number of other ways in which references to typed properties could behave, which are discussed in detail in the Alternatives section.
Multiple Typed Properties
A reference set may contain any number of storage locations, and as such may also contain more than one typed property. In this case the type of each property must be satisfied. The following example illustrates the behavior of creating a reference between four properties with increasingly specific type constraints:
class Test { public $a; public ?iterable $b; public ?array $c; public array $d; } $test = new Test; $test->d = []; $test->c =& $test->d; $test->b =& $test->c; $test->a =& $test->b; // Reference set { $test->a, $test->b, $test->c, $test->d } // Types { mixed, ?iterable, ?array, array } // Effective type: array unset($test->d); // Reference set { $test->a, $test->b, $test->c } // Types { mixed, ?iterable, ?array } // Effective type: ?array unset($test->c); // Reference set { $test->a, $test->b } // Types { mixed, ?iterable } // Effective type: ?iterable unset($test->b); // Reference set { $test->a } // Types { mixed } // Effective type: mixed
The notion of an “effective type” here is illustrative only, the reference does not actually have a type itself, it is merely imposed by the typed properties which are part of it. This effective type does not necessarily have to correspond to a real type supported by PHP's type declaration system, as the following example shows:
class Test { public Countable $c; public Iterator $i; } $test = new Test; $test->c = new ArrayIterator; $test->i =& $test->c; // Reference set { $test->c, $test->i } // Types { Countable, Iterator } // Effective type: Countable&Iterator (satisfied by ArrayIterator)
The effective type Countable&Iterator
is not currently supported by PHP, but as references operate on individual property types, this is not a problem. Another somewhat peculiar example is the following:
class Test { public ?int $i; public ?string $s; } $r = null; $test->i =& $r; $test->s =& $r; // Reference set { $r, $test->i, $test->s } // Types { mixed, ?int, ?string } // Effective type: null
Here there is only a single value which can satisfy both types, which is null
. While this makes intuitive sense (the intersection of ?int
and ?string
is null
), it is instructive to go through this example by following the rules outlined in “General Semantics” exactly:
If we perform an assignment like $r = "42"
, then this passes the ?string
type on $test->s
without coercion. It also passes the ?int
type on $test->i
by coercing the value to int(42)
. However, this means that both assignments would yield different values, string(2) "42"
in one case an int(42)
in the other. As such, a TypeError
is thrown.
Future Interaction with Union Types
While PHP currently does not support union types, there is an interesting interaction with union types that should be pointed out.
class Test { public int|string $x; public float|string $y; } $test = new Test; $r = "foobar"; $test->x =& $r; $test->y =& $r; // Reference set: { $r, $test->x, $test->y } // Types: { mixed, int|string, float|string } // Effective type: ??? $r = 42; // TypeError
In this case, one might think that the effective type is string
, which is one possible interpretation of the intersection of int|string
and float|string
. However, under this proposal the behavior is as follows:
The value 42
passes the int|string
type check for $test->x
without coercion. It also passes the type check float|string
by coercing to 42.0
. As both values are not the same, a TypeError
is generated.
The Alternatives section discusses one scheme where the assignment would be valid instead and assign the coerced value string(2) "42"
, which does not match the value of either property if the assignment happened independently (42
and 42.0
respectively).
By-Reference Overloaded Properties
As described in the main section on overloaded properties, if __get()
is called for an uninitialized or unset property, the returned value must still satisfy the declared property type.
If __get()
returns by reference, then the behavior stays the same. The type is enforced only instantaneously, as the typed property is not actually part of the reference set.
class Test { public int $val; public $dummy = 42; public function &__get($name) { return $this->dummy; } } $test = new Test; $val =& $test->val; // Type checked here $val = "not an int"; // Assignment is legal, reference itself has no type constraints
As a technical caveat, if __get()
returns by reference, but is not used by reference (BP_VAR_R
fetch), then the reference must be immediately unwrapped after the type check. Otherwise clever application of destructors would allow changing the value between the type check and the use of the value.
By-reference arguments to internal functions
Some internal functions accept parameters by reference, for the purpose of returning additional values through them (“out parameters”). Assignments to these reference parameters are subject to the usual type checks when property types are involved:
class Test { public int $x = 0; public string $y = ""; public array $z = []; } $test = new Test; str_replace("foo", "bar", "foofoofoo", $test->x); var_dump($test->x); // int(3) str_replace("foo", "bar", "foofoofoo", $test->y); var_dump($test->y); // string(1) "3" str_replace("foo", "bar", "foofoofoo", $test->z); // TypeError
In this case the type checks use the strictness mode of the caller. Repeating the previous example with strict_types=1
yields:
declare(strict_types=1); $test = new Test; str_replace("foo", "bar", "foofoofoo", $test->x); var_dump($test->x); // int(3) str_replace("foo", "bar", "foofoofoo", $test->y); var_dump($test->y); // TypeError
This behavior is not entirely consistent with other type checks, in that we usually treat all internal code to be in coercive mode, independently of the strictness mode of the caller.
We deviate from this behavior in this instance, based on the mental model of out parameters. Semantically an out parameter is equivalent to a function with multiple return values:
$newStr = str_replace($a, $b, $c, $count); // Semantically equivalent to [$newStr, $count] = str_replace($a, $b, $c);
Under this desugaring, it would be expected that the assignment to $count
respects the strictness mode of the caller.
Automatic Initialization
In PHP, creating a reference to an uninitialized storage location, implicitly initialized it to value null
. For typed properties this behavior is preserved if the property is nullable. If the property is not nullable, then a TypeError
is generated, as initializing the property to null
would violate the type constraint:
class Test { public ?int $x; public int $y; } $test = new Test; $x =& $test->x; // Initialized to null $y =& $test->y; // TypeError
Reflection
The ReflectionProperty
class is extended by three methods:
class ReflectionProperty { // ... public function getType(): ?ReflectionType; public function hasType(): bool; public function isInitialized([object $object]): bool; }
getType()
returns a ReflectionType
if the property has a type, and null
otherwise. hasType()
returns true
if the property has a type, and false
otherwise. The behavior matches that of getType()
/hasType()
for parameters and getReturnType()
/hasReturnType()
for return types.
isInitialized()
returns whether the property is initialized. It will return false
for typed properties prior to initialization and for properties that have been explicitly unset()
. For all other properties true
will be returned. Otherwise this method has the same error conditions as ReflectionProperty::getValue()
. In particular $object
must be passed for non-static properties and must be an instance of the class on which the property was declared, otherwise a ReflectionException
is thrown. If the property is non-public and setAccessible(true)
has not been called, a ReflectionException
is thrown.
Backward Incompatible Changes
None.
Alternatives
Syntax
Given PHP's existing syntax for parameter and return type annotations, there are two obvious choices of syntax for property type annotations:
class Example { public int $num; public $num: int; }
The former follows the syntax for parameter types, the latter follows return types. The first syntax is familiar from languages such as C, C++ or Java. The latter is adopted in some more modern languages, such as TypeScript or Rust. Both syntax choices should be intuitive and familiar to modern-day programmers.
If the : Type
notation is used, there are two potential ways to combine it with default values:
class Example { public $num: int = 42; public $num = 42: int; }
In this case we would prefer the former syntax, as it clearer, especially if the initializer expression is complex. For example, if the initializer is a multi-line array and the second form is used, the type would only appear very far away from the property name. Additionally the combination with the ternary operator (FOO ? BAR : BAZ : int
), while technically conflict-free, would be confusing.
PHP also supports simultaneous declaration of multiple properties. In this case the syntactical implication of the : Type
notation would require repeating the type for each property:
class Example { // Prefix notation public int $x, $y, $z; // Suffix notation public $x: int, $y: int, $z: int; }
This is somewhat inconsistent with the visibility modifier, which applies to all properties in the declaration.
One case where the suffix notation seems to be clearer is when used with the legacy var
syntax:
class Example { var int $num; var $num: int; }
For the prefix notation, it would be possible to permit omitting var
in this case:
class Example { int $num; // would be equivalent to public int $num; }
However, given the strong preference towards explicitly specifying the property visibility in modern PHP code, the addition of this reduced syntax does not appear worthwhile. This proposal only supports type declarations on var
for syntactical consistency, but does not anticipate this form to be used to any significant degree.
Overall, we hold that the prefix syntax integrates slightly more seamlessly into the existing syntax of property declarations.
Callable Type
This proposal for typed properties does not support the callable
type due to its context-dependent behavior. Repeating the example from the “Supported Types” section:
class Test { public callable $cb; public function __construct() { // $this->cb is callable here $this->cb = [$this, 'method']; } private function method() {} } $obj = new Test; // $obj->cb is NOT callable here ($obj->cb)();
Here, the value is legal at the time of write, but not at the time of read. This RFC proposes to avoid this issue by prohibiting the callable
type. Closure
and Closure::fromCallable()
can be used as a robust alternative, where callability is scope-independent.
There are a number of possible alternative resolutions, which will be discussed in the following.
The first option is to ignore the issue and simply allow a non-callable value to be returned. This is effectively the option that was implemented for return types. A method with callable
return type will happily return a callable to a private method, even though the return value will not be actually callable at the call-site. In terms of overall ergonomics, this option may very well be the best, as most uses of the callable
type will not run into the above issue, and callable
generally only provides the weak guarantee that something is callable at the time of type-checking, but not necessarily at the time when the actual call is performed. However, we feel that taking this option would go against the fundamental invariant we are trying to establish, namely that values read from a typed property always satisfy the type constraint.
The second option is to perform type-checks both when writing and reading a property, as opposed to the current situation where checks are only performed on write. This would make the behavior distinctly odd, but at least type-safe, as the $obj->cb
read in the above example would generate a TypeError
.
There are two primary arguments against this option: The first is that this carries a surprise factor, in that you can have a fully and legally initialized object, which still throws when accessing properties. The second is that this would add an unnecessary performance impact to all property reads, just to handle this special case. The current implementation goes through some effort to make sure that we only need to check types during property writes, which tend to be rarer than reads and more amenable to inference-based optimizations.
The third option is to take the visibility of the property into account when performing the callability check. That is, if the property is public callable $cb
, then only callables that are callable from public scopes will be considered callable. If the property is private callable $cb
, then private methods will also be accepted.
The advantage of this solution is that it is quite ergonomic and even solves a part of the overall problem of callable
. The disadvantage is, apart from introducing special behavior that callable
does not exhibit elsewhere, that this creates a tight coupling between the visibility of the property and its type. For example, this means that increasing the visibility of a property in an inheriting class (protected callable $cb
to public callable $cb
), an operation that is otherwise always legal, would not be permissible for callable
properties. Even without inheritance, changing a private callable property into a protected one could require further code modifications, as existing assignments to the property may no longer be legal.
The fourth option is to automatically wrap assignments to callable
properties into Closure::fromCallable()
. This would ensure that any values that were callable at the time of write would remain callable at the time of read. However, if we would like to introduce such a behavior, we believe that it should be introduced for all places where the callable
type may occur, not just typed properties. Furthermore, performing this automatic wrapping would further increase the cost of callable
types.
As there are many different options to choose from, we consider it best to go with the conservative choice of prohibiting callable
types for the time being. This choice allows switching to any of the other variants at a later point in time, without introducing a backwards compatibility break.
Property Initialization
Under this RFC, properties that do not have a default value are considered uninitialized, with reads from uninitialized properties resulting in a TypeError
. There are some alternative ways in which property initialization could be handled.
The first would be to mirror the behavior of untyped properties and use an implicit null
default value when no explicit default is given.
class Test { public SomeClass $val; // Would be equivalent to public ?SomeClass $val = null; }
While this provides consistency with untyped properties, it has the obvious and quite disastrous disadvantage that null
becomes a valid type for all properties that do not have or cannot have a meaningful default value, even though the value null
will only ever occur prior to initialization. We believe that this is a very common case, and that adding implicit nullability will greatly hurt the expressibility of the type-system.
An intermediate option would be to leave non-nullable properties uninitialized, but give nullable properties an implicit null
default value. This option at least does not compromise the expressiveness of the type system, though it does introduce an inconsistency between the handling of nullable and non-nullable typed properties.
Furthermore, we consider having a distinct uninitialized state to also have value for nullable properties. Most properties, including nullable ones, are expected to be explicitly initialized in the constructor. Failing to do so (e.g. due to a typo) should not result in an undetectable, implicit null
value. Finally, while it is always easy to add an explicit null
default value, PHP has no initializer syntax for an uninitialized value (there is no public Type $prop = unset
notation, or similar). If uninitialized is not the default state, then there is no way to opt-in to it.
Another alternative that has been proposed is to require that the constructor initializes all typed properties:
class Point { public float $x; public float $y; public function __construct(float $x, float $y) { $this->x = $x; return; // Throws TypeError, because Test::$y is uninitialized } }
The advantage of this scheme is that initialization errors are reported during object construction, rather than the first time an uninitialized property is accessed. At the same time, it imposes restrictions on the code patterns that can be used. For example, the following code using a named constructor would not be legal:
class Point { public float $x, $y; private function __construct() {} public static function fromEuclidean(float $x, float $y) { $point = new Point; $point->x = $x; $point->y = $y; return $point; } }
Of course, this code can be rewritten to indirect through __construct()
instead.
Another issue is that this does not really solve the problem of uninitialized properties, as they can still be uninitialized inside the constructor. While we can statically detect this in some cases, this is not possible in others:
class Point { public float $x, $y; public function __construct(float $x, float $y) { $this->doSomething(); $this->x = $x; $this->y = $y; }
Here, we will not generally know whether $this->doSomething()
accesses the typed properties. We would either have to restrict the functionality usable in a constructor to an unreasonable degree, or else still allow (throwing) reads from uninitialized properties in the constructor.
Finally, because we still support unsetting of properties, we need to deal with uninitialized properties due to that, so all these measures do not really improve the situation.
References
This RFC proposes to handle references to typed properties by tracking which typed properties are part of the reference set and checking against each type individually, while making sure that value coercions do not result in inconsistent values. In the following alternative ways of handling references are discussed.
Forbid references
The previous RFC on typed properties did not permit acquiring references to typed properties. This has the significant disadvantage of creating an inconsistency and segregating the language. You can have typed properties, you can have references, but you can't have both.
Specifically, as already mentioned in the main section, it prevents use of internal functions that accept parameters by reference, such as sort
. We've also been assured that references to properties play some important and irreplaceable role for proxy objects in Doctrine, though the details remain elusive.
On the other hand, supporting references to typed properties makes the proposal significantly more complicated. A very large fraction of this proposal text is concerned with the behavior of references, and the large design space surrounding them. Additionally, handling of references also makes up most of the implementation complexity of the proposal.
Check types on read
An alternative to the general “typed references” approach pursued by this RFC would be to allow arbitrary changes to references and only validate the type when the affected property is read next:
class Test { public int $x = 42; } $test = new Test; $x =& $test->x; $x = "not an int"; // allowed var_dump($test->x); // TypeError
The advantage of this approach is that it reduces the technical complexity, as there is essentially only a single place where we have to perform an additional type check. This proposal generally avoids type checks on reads in part for performance reasons, but in this specific case (where the extra type check would only be necessary if the property contains a reference) the performance issue would be less critical.
The main disadvantage is that the point where the TypeError
is generated is now removed from the point where the illegal value has been assigned. One of the main motivating reasons for having any kind of type checks in the first place, is to detect errors early and pinpoint their cause: Passing an invalid value to a parameter will quite likely result in an error at some point, but determining the cause may not be easy. If the parameter is typed, then the error occurs directly at the point where the mistake was made.
The same also applies to references to typed properties, maybe even to a higher degree, as references perform “spooky action at a distance”. For this reason, we believe it is important to report errors as early as possible, instead of delaying them until the next read.
Additionally, checking types on read has unfortunate interactions with coercive typing:
class Test { public int $x = 42; } $test = new Test; $x =& $test->x; $x = "24"; var_dump($x); // string(1) "24" var_dump($test->x); // int(24)
In this case $test->x
and $x
point to the same reference, but reads from them would produce different values.
References with intrinsic type
The current proposal is careful to avoid assigning an intrinsic type to references. While we have introduced the notion of an “effective type” for illustrative purposes, references simply track with typed properties are part of them without having a type themselves. This alternative (which was proposed by an earlier version of the RFC) instead gives references an intrinsic type, which is the intersection type of all property types currently contained within it.
Under this approach, a TypeError
is generated either if the computed intersection type is empty (e.g., int
and string
have empty intersection), or if the intersection cannot be represented in PHP's current type system (Countable
and Traversable
have intersection Countable&Traversable
, but this is currently not a legal type).
Intersection types are computed as follows (through pairwise reduction):
- If both types are nullable, the intersection is nullable. Otherwise it is not nullable.
- If both types are class types
A
andB
: IfA
instanceofB
the type isA
. IfB
instanceofA
the type isB
. Otherwise the intersection is not representable (A&B
). - If one type is
array
and the other isiterable
, the intersection isarray
. - If one type is
object
and the other isA
, then the intersection isA
. - If one type is
iterable
and the other isA
whereA instanceof Traversable
, then the intersection isA
. - If one type is
iterable
and the other isobject
, then the intersection isTraversable
. - Otherwise, the intersection is empty.
Introduction of proper intersection types would allow to represent the A&B
case. As PHP has no proper null
type (in the type annotation system), the intersection of something like ?int
and ?string
is considered to be empty, even though the value null
would satisfy both types.
Under this alternative, the following two codes would no longer work:
class Test { public Countable $c; public Iterator $i; } $test = new Test; $test->c = new ArrayIterator; $test->i =& $test->c; // TypeError // Reference set { $test->c, $test->i } // Intersection type: Countable&Iterator (not representable!)
class Test { public ?int $i; public ?string $s; } $r = null; $test->i =& $r; $test->s =& $r; // Reference set { $r, $test->i, $test->s } // Intersection type: null (not representable!)
Furthermore, the behavior with regard to union types would change. Reproducing the example from the main body of the RFC:
class Test { public int|string $x; public float|string $y; } $test = new Test; $r = "foobar"; $test->x =& $r; $test->y =& $r; // Reference set: { $r, $test->x, $test->y } // Intersection type: string $r = 42; // Current RFC: TypeError. This alternative: Legal! // Results in $test->x == $test->y == "42"
Under the current RFC, this code generates a TypeError
, as assigning 42
to $test->x
would result in value 42
, while assigning it to $test->y
would yield 42.0
. As both values are inconsistent, a TypeError
is thrown.
Under this alternative, we would instead compute the intersection of int|string
and float|string
to be string
and the assigning of 42
would result in the coerced value “42”
. Note that this does not match either of the values that would result from direct assignment.
Because this approach both carries artificial restrictions (Countable&Iterator
not supported) and results in this unfortunate behavior for union types, we decided to move away from this alternative.
Reference types independent of property lifetime
While the previous alternative assigns references an intrinsic type, it also keeps track of which properties are part of the reference set. If a typed property is removed from a reference set, then so is the type constraint that it imposes. (The intersection type needs to be recomputed whenever a typed property is added or removed from the reference set.)
An alternative approach would be to make the type of the reference independent of the typed property. If the reference was constructed with a type, it would retain that type even if the typed property is removed from the reference set:
class Test { public int $x = 42; } $test = new Test; $x =& $test->x; // Reference created with type int $y =& $x; unset($test); // Reference still has type int $x = "not an int"; // TypeError
A primary issue with this approach is that the behavior becomes fragile when singleton reference sets are involved, as demonstrated by the following example.
class Test { public int $x = 42; } $test = new Test; $array = [&$test->x]; unset($test); // $array[0] is reference with rc=1 // var_dump($array[0]); // Toggle me $array[0] = "not an int";
In this case a TypeError
would be generated without the var_dump
, but not with it. The reason is that the access $array[0]
performs an unref operation under the current implementation. Generally, this scheme violates the principle that singleton reference sets are transparent.
A way to resolve this problem would be to not enforce the type for singleton reference sets. Even so, behavior inconsistencies could still arise. If a new reference to $array[0]
is created, then behavior would differ based on whether this happened before or after the above var_dump
call.
One advantage of uncoupled reference types is technical in nature: To support the behavior proposed in this RFC, each reference has to keep track which typed properties are part of it. As we don't expect reference sets with multiple typed properties to be common, we don't consider this to be particularly problematic, but it does add additional overhead.
Another advantage is that removal of a property from a reference set, which may occur at a distance, may no longer influence local behavior. Consider the following example:
class Test { public int $x = 0; public Test $cycle; } $test = new Test; $test->cycle = $test; $x =& $test->x; unset($test); $x = "foo"; // This proposal: TypeError. Uncoupled types: TypeError gc_collect_cycles(); $x = "foo"; // This proposal: Legal. Uncoupled types: TypeError
Under the current proposal, garbage collection (which can be triggered as a side-effect of many operations) removes the typed property from the reference set, changing assignment behavior. If reference types are uncoupled, this cannot occur. On the other hand, GC can already result in many other side-effects (in the case of destructors arbitrary ones), so this is not really new.
Impact on extensions
Ordinary extensions (i.e. excluding anything very tightly coupled to the engine like xdebug) will have to make two changes to properly support this RFC. One is a change to the write_property
signature, which has to be made to ensure the extension compiles without warnings. The other are changes to reference assignments. These are not necessary to build against PHP 7.4, but are needed to ensure the behavior is fully correct. The follow sections contain detailed porting instructions.
Assignments to references
All assignments to references may now require a type check and a coercion of the assigned value. This includes assignments performed inside internal functions. Previously, such assignments used a pattern such as the following:
ZVAL_DEREF(zv); zval_ptr_dtor(zv); ZVAL_LONG(zv, num);
Code like this needs to be replaced by the following:
ZEND_TRY_ASSIGN_LONG(zv, num); // Shorthand for: zval num_zv; ZVAL_LONG(&num_zv, num); zend_try_assign(zv, &num_zv);
The ZEND_TRY_ASSIGN_LONG
will generate a TypeError
if the reference assignment fails. Similar macros are available for other types as well.
Typically the ZVAL_DEREF()
in the above example is already performed as part of parameter parsing, in the form of a z/
or ZEND_PARAM_ZVAL_DEREF
argument. Examples for both are shown in the following:
// Replace PHP_FUNCTION(test) { zval *arg; if (zend_parse_parameters("z/", &arg) == FAILURE) { return NULL; } zval_ptr_dtor(arg); ZVAL_LONG(arg, 0); } // With PHP_FUNCTION(test) { zval *arg; if (zend_parse_parameters("z", &arg) == FAILURE) { return NULL; } ZEND_TRY_ASSIGN_LONG(arg, 0); }
// Replace PHP_FUNCTION(test) { zval *arg; ZEND_PARSE_PARAMETERS_START(1, 1) ZEND_PARAM_ZVAL_DEREF(arg) ZEND_PARSE_PARAMETERS_END(); zval_ptr_dtor(arg); ZVAL_LONG(arg, 0); } // With PHP_FUNCTION(test) { zval *arg; ZEND_PARSE_PARAMETERS_START(1, 1) ZEND_PARAM_ZVAL(arg) ZEND_PARSE_PARAMETERS_END(); ZEND_TRY_ASSIGN_LONG(arg, 0); }
For arrays it is necessary to additionally dereference the zval after the assignment so that subsequently additional operations can be performed on the array. The zend_try_array_init()
and zend_try_array_init_size()
functions handle simplify this case:
// Replace PHP_FUNCTION(test) { zval *arg; if (zend_parse_parameters("z/", &arg) == FAILURE) { return NULL; } zval_ptr_dtor(arg); array_init(arg); // Lots of array operations here } // With PHP_FUNCTION(test) { zval *arg; if (zend_parse_parameters("z", &arg) == FAILURE) { return NULL; } arg = zend_try_array_init(arg); if (!arg) { return; } // Lots of array operations here }
// Replace PHP_FUNCTION(test) { zval *arg; ZEND_PARSE_PARAMETERS_START(1, 1) ZEND_PARAM_ZVAL_DEREF(arg) ZEND_PARSE_PARAMETERS_END(); zval_ptr_dtor(arg); array_init(arg); // Lots of array operations here } // With PHP_FUNCTION(test) { zval *arg; ZEND_PARSE_PARAMETERS_START(1, 1) ZEND_PARAM_ZVAL(arg) ZEND_PARSE_PARAMETERS_END(); arg = zend_try_array_init(arg); if (!arg) { return; } // Lots of array operations here }
Please note that the return value of zend_try_array_init()
must be checked if you plan to continue working on the array. Otherwise, there would be no guarantee that arg
is actually an array.
Places where such replacements are necessary can be detected by searching for z/
, ZEND_PARAM_ZVAL_DEREF
and ZVAL_DEREF
in the extension. Alternatively or additionally, function parameters that are declared as by-reference in arginfo can be reviewed.
To avoid having to write different code for different PHP versions, we provide a shim header that emulates all the necessary macros on older versions of PHP: https://gist.github.com/nikic/bb874084aaa0af6d56d6ab491a6d630b
write_property handler
Under this RFC, assignments to typed properties return the final, potentially coerced value. This requires a change to the signature of the write_property
object handler:
// OLD typedef void (*zend_object_write_property_t)(zval *object, zval *member, zval *value, void **cache_slot); // NEW typedef zval *(*zend_object_write_property_t)(zval *object, zval *member, zval *value, void **cache_slot);
In most cases the return value should simply be value
, or the return value of an inner zend_std_write_property()
call. The following example shows the necessary changes for a typical extension write_property
handler:
// Replace void foobar_write_property(zval *object, zval *member, zval *value, void **cache_slot) { if (special_property) { do_special_handling(); } else { zend_std_write_property(object, member, value, cache_slot); } } // With zval *foobar_write_property(zval *object, zval *member, zval *value, void **cache_slot) { if (special_property) { do_special_handling(); } else { value = zend_std_write_property(object, member, value, cache_slot); } return value; }
The different implementations need to be appropriately guarded by PHP version.
Performance
Dmitry has performed some benchmarks of the current typed properties implementation. The results are available at https://gist.github.com/dstogov/b9fc0fdccfb8bf7bae121ce3d3ff1db1. (Note: The reported MediaWiki result was not reproducible on repeated runs, the actual slowdown seems to be about 2-3%.)
The main takeaway is that typed properties have an impact on performance even if they are not used. For applications, the impact is around 1-2%. We expect that performance will be improved prior to landing, but at least some impact is probably not avoidable.
Vote
As this is a language change, a 2/3 majority is required.
Errata
During final implementation work after the RFC was accepted, a number of cases were encountered which weren't explicitly specified in the original RFC text. They are documented here instead.
Automatic promotion of arrays and objects
PHP automatically initializes falsy values that are used as arrays or objects to an empty array or stdClass
respectively. When this happens with a typed property or a reference to a typed property, the property type must be compatible with an array or stdClass object.
class Test { public ?int $prop = null; } $test = new Test; $test->prop[] = 123; // TypeError, because we can't assign array to ?int $test->prop->foobar = 123; // TypeError, because we can't assign stdClass to ?int $prop =& $test->prop; $prop[] = 123; // TypeError, because we can't assign array to ?int $prop->foobar = 123; // TypeError, because we can't assign stdClass to ?int
Strictness of runtime-evaluated default values
Default values for both parameters and properties always follow the strict typing semantics, independently of the strict typing mode that applies in a particular file. However, there is one exception to this rule: If a constant expression parameter default value cannot be evaluated during compilation, it follows the strictness mode of the file instead:
function foo(int $x = FOO) { // currently allowed var_dump($x); } define('FOO', '42'); foo();
For typed properties we do not make such an exception and following code will generate a TypeError:
class Test { public int $x = FOO; // TypeError } define('FOO', '42'); var_dump(new Test);
The reason for this choice is that evaluation of constant expressions at compile-time vs run-time is an optimization choice and should not result in behavioral differences. Whether a constant expression is evaluated during compilation depends on many factors, including code order and whether or not opcache is enabled. We believe that the current behavior of parameters is a bug, not an intentional choice.
Incrementing/decrementing beyond the maximal/minimal value
When a value is incremented beyond PHP_INT_MAX
or decremented beyond PHP_INT_MIN
it is converted into a floating-point value and incremented/decremented as a floating-point value. Additionally, under PHP's type verification rules (both strict and weak), assigning an out-of-range floating point value to an integer is illegal.
As stated, this would result in the following peculiar behavior: Incrementing an int
property past the maximal value would always be an error, because (float)PHP_INT_MAX + 1
exceeds the integer range. However, decrementing an int
property past the minimal value would only error on 32-bit systems. The reason is that on 64-bit systems (float)PHP_INT_MIN - 1
is the same as (float)PHP_INT_MIN
, which is accurately representable as a double-precision floating point number and as such can be assigned back to an int
property without error.
As such, we would always generate an error on increment/decrement overflow, apart from the case of decrements on 64-bit systems.
To avoid this, we instead define that incrementing/decrementing an int
property past the maximal/minimal value always generates an error. It should be noted that this only affects the ++
and --
operators. Overflows caused by other means are not handled specially.
Changelog
Significant changes to the RFC are noted here.
- 2019-01-07: Add errata: Increment/decrement overflow behavior.
- 2019-01-03: Add errata: Strictness of runtime-evaluated default values.
- 2019-01-03: Add errata: Automatic promotion of arrays and objects.
- 2018-07-16: Add note about compatibility requirement on properties coming from traits.
- 2018-07-10: Add shim header to make porting extension easy.
- 2018-07-10: Note that write_property signature has changed.
- 2018-07-10: Switch reference to the “no intrinsic type” alternative.
- 2018-07-03: The zpp
t
specifier has been removed and replaced byzend_try_array_init()
. This is friendlier to extensions. - 2018-07-02: Add
ReflectionProperty::isInitialized()
method. - 2018-06-21: Explicitly mention that invariance also applies to static properties, and justify why.